Banishing Bias from Consensus Sequences

نویسندگان

  • Amir Ben-Dor
  • Giuseppe Lancia
  • Jennifer Perone
  • R. Ravi
چکیده

With the exploding size of genome databases it is becoming increasingly important to devise search procedures that extract relevant information from them One such procedure is particularly e ective in nding new distant members of a given family of related sequences start with a multiple alignment of the given members of the family and use an integral or fractional consensus sequence derived from the alignment to further probe the database However the multiple alignment constructed to begin with may be biased due to skew in the sample of sequences used to construct it We suggest strategies to overcome the problem of bias in building con sensus sequences When the intention is to build a fractional consensus sequence often termed a pro le we propose assigning weights to the sequences such that the resulting fractional sequence has roughly the same similarity score against each of the sequences in the family We call such fractional consensus sequences balanced pro les On the other hand when only regular sequences can be used in the search we propose that the consensus sequence have minimum maximum distance from any sequence in the family to avoid bias Such sequences are NP hard to compute exactly so we present an approximation algorithm with very good performance ratio based on randomized rounding of an integer pro gramming formulation of the problem We also mention applications of the rounding method to selection of probes for disease detection and to construction of consensus maps

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Strong spurious transcription likely a cause of DNA insert bias in typical metagenomic clone libraries

Background: Clone libraries provide researchers with a powerful resource with which to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, as well as allowed the mining of novel enzymes for specific functions of interest. These libraries are often constructed by cloning large-inserts (~30 kb) into a cos...

متن کامل

Strong spurious transcription likely contributes to DNA insert bias in typical metagenomic clone libraries

BACKGROUND Clone libraries provide researchers with a powerful resource to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, and allowed the mining of novel enzymes. Libraries are often constructed by cloning large inserts into cosmid or fosmid vectors. Recently, there have been reports of GC bias in ...

متن کامل

Molecular typing of avian Escherichia coli isolates by enterobacterial repetitive intergenic consensus sequences-polymerase chain reaction (ERIC-PCR)

BACKGROUND: Colibacillosis is one of the most economically important diseases of poultry worldwide. OBJECTIVES: This study was conducted to examine the clonal relatedness and typing of 95 avian Escherichia coli isolates by ERIC-PCR. METHODS: Sixty-three E. coli isolates from two common manifestations of colibacillosis (yolk sac infection and colisepticemia) and 32 isolates from feces of apparen...

متن کامل

Codon bias patterns in photosynthetic genes of halophytic grass Aeluropus littoralis

Codon bias refers to the differences in the frequency of occurrence of synonymous codons in coding DNA. Pattern of codon and optimum codon utilization is significantly different between the lives. This difference is due to the long term function of natural selection and evolution process. Genetics drift, mutation and regulation of gene expression are the main reasons for codon bias. In this stu...

متن کامل

Mitochondrial DNA variation in wild and hatchery populations of northern pike, Esox lucius L.

Esox lucius is an economically important freshwater species. Mitochondrial cytb, 12SrRNA, and 16SrRNA gene sequences were used in order to clarify the genetic variation and population structure in three E. Lucius populations, i.e., one Wild population (W) and two hatchery populations (Hatchery Population I-HPI and Hatchery Population II-HPII). A total of 55 individuals, with 19 from wild and 1...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997